4-5/5/2021
Research Workflows
Pipeline
Workflow
R and RStudioR)tidyverseggplot2RMarkdownR?R is:
R languageWhy use R?
RStudio?Please start RStudio
RStudio is an integrated development environment (IDE)R (console/‘scratchpad’); Graphics/visualisation/HelpExcel?”Excel is good for some thingsR is excellent for analysis and reproducibility…R can be run on supercomputers, with extremely large datasets…RStudio overview - INTERACTIVE DEMOVariables are like named boxes
Name)x <- 1 / 40 x
## [1] 0.025
x ^ 2
## [1] 0.000625
log(x)
## [1] -3.688879
name <- "Samia" name
## [1] "Samia"
Variable names are documentation
current_temperature = 28.6 subjectID = "GCF_00001236452.1" GPS_Location = "54N, 36E"
[a-zA-z0-9_.])x2 is allowed, 2x is not)Weight is not the same as weight)lower_snake, UPPER_SNAKE, lowerCamelCase, UpperCamelCaseFunctions (log(), sin() etc.) ≈ “canned script”
sqrt(), lm(), plot())RINTERACTIVE DEMO
args(fname) # arguments for fname
?fname # help page for fname
help(fname) # help page for fname
??fname # any mention of fname
help.search("text") # any mention of "text"
vignette(fname) # worked examples for fname
vignette() # show all available vignettes
What will be the value of each variable after each statement in the following program?
mass <- 47.5 age <- 122 mass <- mass * 2.3 age <- age - 20
mass = 47.5, age = 102mass = 109.25, age = 102mass = 47.5, age = 122mass = 109.25, age = 122USE CHALLENGE LINK ON ETHERPAD
RTHERE IS NO ONE TRUE WAY (only principles)
data?)clean_data?)RStudioRStudio tries to help you manage your projects
R Project concept - files and subdirectory structureRStudioLet’s create a project in RStudio
INTERACTIVE DEMO
RStudio projects: https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
RStudioWe can write code in several ways in RStudio
We’re going to create a new dataset and R script.
INTERACTIVE DEMO
RStudioDownload the file from the following link to your data/ directory, and extract it
(the link is also available on the course Etherpad page)
Data files can be inspected in RStudio
read.csv(file = "data/inflammation-01.csv", header = FALSE)
Someone gives you a data file that has:
,) as the decimal point character;) as the field separatorHow would you open it, using read.csv()
Use the help function and documentation
USE CHALLENGE LINK ON ETHERPAD
INTERACTIVE DEMO
[][row, column]data[1, 1] # First value in dataset data[30, 20] # Middle value of dataset
: separator (meaning ‘to’)data[1:4, 1:4] # rows 1 to 4; columns 1 to 4
data[5, ] # row 5 data[, 16] # column 16
INTERACTIVE DEMO
R provides useful functions to summarise datamax(data) # largest value in dataset max(data[2, ]) # largest value for row (patient) 2 min(data[, 7]) # smallest value on column (day) 7 mean(data[, 7]) # mean value on day 7 sd(data[, 7]) # standard deviation of values on day 7
INTERACTIVE DEMO
Computers exist to do tedious things for us
So apply a function (mean) to each row in the data:
R has several ways to automate this process
apply(X = data, MARGIN = 1, FUN = mean)
MARGIN = 1: rowsMARGIN = 2: columnsrowMeans(data) colMeans(data)
“The purpose of computing is insight, not numbers.” - Richard Hamming
R has many available graphics packages
INTERACTIVE DEMO
plot(avg_inflammation_patient) max_day_inflammation <- apply(dat, 2, max) plot(max_day_inflammation) plot(apply(dat,2,min)) # 3 functions in one!
Can you add plots to your script showing:
RRRR dataR’s data types and structures relate to your own dataRR is mostly used for data analysisR has special types and structures to help you work with dataINTERACTIVE DEMO
Understanding data types, their uses, and how they relate to your own data is key to successful analysis with R
(it’s not just about programming)
What data types would you expect to see?
What examples of data types can you think of from your own experience?
Please write them into the chat
RR are atomic
TRUE, FALSE3, 2L, 1234563.0, -23.45, pi3+0i, 1+4i"a", 'SWC', "This is not a string"INTERACTIVE DEMO
Create examples of data with the following characteristics:
answer, type: logicalheight, type: numericdog_name, type: characterFor each variable, test that it has the data type you intended
R Data Structuresvectorfactorlistdata.frameINTERACTIVE DEMO
Vectors are atomic: they can contain only a single data type
What data type are the following vectors (xx, yy, zz)?
xx <- c(1.7, "a")
yy <- c(TRUE, 2)
zz <- c("a", TRUE)
Options: logical, integer, numeric, character
USE CHALLENGE LINK ON ETHERPAD
R will perform implicit coercion on vectors to make them atomiclogical \(\rightarrow\) integer \(\rightarrow\) double \(\rightarrow\) complex \(\rightarrow\) character
If there are formatting problems with your data, you might not have the type you expect when you import into R
as.<type_name>()INTERACTIVE DEMO
Data comes as one of two types:
weight <- 17.2; rooms <- 7)grade <- "8", coat <- "brindled")This kind of distinction critical in many applications (e.g. statistical modelling)
INTERACTIVE DEMO
Create a new factor, defining control and case experiments, and inspect the result:
f <- factor(c("case", "control", "case", "control", "case"))
str(f)
## Factor w/ 2 levels "case","control": 1 2 1 2 1
In some statistical analyses in R it is important that the control level is numbered 1
RStudio, can you create a factor with the same values, but where the control level is numbered 1?lists are like vectors, but can hold any combination of datatype
list are denoted by [[]] and can be namedINTERACTIVE DEMO
# create a list l <- list(1, 'a', TRUE, matrix(0, nrow = 2, ncol = 2), f) l_named <- list(a = "SWC", b = 1:4)
> animal[c(2,4,6)] [1] "o" "k" "y" > l_named$b [1] 1 2 3 4
INTERACTIVE DEMO
x <- c(5.4, 6.2, 7.1, 4.8, 7.5) mask <- c(TRUE, FALSE, TRUE, FALSE, TRUE) x[mask] x[x > 7]